Goto

Collaborating Authors

 cumulative payoff


Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

Neural Information Processing Systems

In linear stochastic bandits, it is commonly assumed that pa yoffs are with sub-Gaussian noises. In this paper, under a weaker assumption on noises, we study the problem of lin ear stochastic b andits with he avy-t ailed payoffs (LinBET), where the distributions have finite moments of order 1+ ฯต,f o rs o m e ฯต (0, 1] .W e rigorously analyze the regret lower bound of LinBET as โ„ฆ( T


Selection pressure/Noise driven cooperative behaviour in the thermodynamic limit of repeated games

arXiv.org Artificial Intelligence

Consider the scenario where an infinite number of players (i.e., the \textit{thermodynamic} limit) find themselves in a Prisoner's dilemma type situation, in a \textit{repeated} setting. Is it reasonable to anticipate that, in these circumstances, cooperation will emerge? This paper addresses this question by examining the emergence of cooperative behaviour, in the presence of \textit{noise} (or, under \textit{selection pressure}), in repeated Prisoner's Dilemma games, involving strategies such as \textit{Tit-for-Tat}, \textit{Always Defect}, \textit{GRIM}, \textit{Win-Stay, Lose-Shift}, and others. To analyze these games, we employ a numerical Agent-Based Model (ABM) and compare it with the analytical Nash Equilibrium Mapping (NEM) technique, both based on the \textit{1D}-Ising chain. We use \textit{game magnetization} as an indicator of cooperative behaviour. A significant finding is that for some repeated games, a discontinuity in the game magnetization indicates a \textit{first}-order \textit{selection pressure/noise}-driven phase transition. The phase transition is particular to strategies where players do not severely punish a single defection. We also observe that in these particular cases, the phase transition critically depends on the number of \textit{rounds} the game is played in the thermodynamic limit. For all five games, we find that both ABM and NEM, in conjunction with game magnetization, provide crucial inputs on how cooperative behaviour can emerge in an infinite-player repeated Prisoner's dilemma game.


Multi-agent Cooperation in Diverse Population Games

Neural Information Processing Systems

We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.


Multi-agent Cooperation in Diverse Population Games

Neural Information Processing Systems

We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.


Multi-agent Cooperation in Diverse Population Games

Neural Information Processing Systems

We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases tothe initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.